首页> 外文OA文献 >Spectral Reconstruction and Noise Model Estimation Based on a Masking Model for Noise Robust Speech Recognition
【2h】

Spectral Reconstruction and Noise Model Estimation Based on a Masking Model for Noise Robust Speech Recognition

机译:基于掩蔽模型的噪声鲁棒语音识别谱重建与噪声模型估计

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

An effective way to increase noise robustness in automatic speech recognition (ASR) systems is feature enhancement based on an analytical distortion model that describes the effects of noise on the speech features. One of such distortion models that has been reported to achieve a good trade-off between accuracy and simplicity is the masking model. Under this model, speech distortion caused by environmental noise is seen as a spectral mask and, as a result, noisy speech features can be either reliable (speech is not masked by noise) or unreliable (speech is masked). In this paper, we present a detailed overview of this model and its applications to noise robust ASR. Firstly, using the masking model, we derive a spectral reconstruction technique aimed at enhancing the noisy speech features. Two problems must be solved in order to perform spectral reconstruction using the masking model: (1) mask estimation, i.e. determining the reliability of the noisy features, and (2) feature imputation, i.e. estimating speech for the unreliable features. Unlike missing data imputation techniques where the two problems are considered as independent, our technique jointly addresses them by exploiting a priori knowledge of the speech and noise sources in the form of a statistical model. Secondly, we propose an algorithm for estimating the noise model required by the feature enhancement technique. The proposed algorithm fits a Gaussian mixture model to the noise by iteratively maximising the likelihood of the noisy speech signal so that noise can be estimated even during speech-dominating frames. A comprehensive set of experiments carried out on the Aurora-2 and Aurora-4 databases shows that the proposed method achieves significant improvements over the baseline system and other similar missing data imputation techniques.
机译:在自动语音识别(ASR)系统中提高噪声鲁棒性的有效方法是基于分析失真模型的特征增强,该模型描述了噪声对语音特征的影响。据报道,在失真和精确度之间取得良好折衷的这种失真模型之一就是掩蔽模型。在此模型下,由环境噪声引起的语音失真被视为频谱掩膜,因此,嘈杂的语音特征可能是可靠的(语音没有被噪声掩盖)或不可靠的(语音被掩盖)。在本文中,我们详细介绍了该模型及其在抗噪声ASR中的应用。首先,使用掩蔽模型,我们得出了一种旨在增强噪声语音特征的频谱重建技术。为了使用掩蔽模型执行频谱重建,必须解决两个问题:(1)掩蔽估计,即确定噪声特征的可靠性,以及(2)特征归因,即,针对不可靠特征估计语音。与缺失的数据归因技术不同,这两个问题被认为是独立的,我们的技术通过利用统计模型形式的语音和噪声源先验知识来共同解决它们。其次,我们提出了一种用于估计特征增强技术所需的噪声模型的算法。所提出的算法通过迭代最大化噪声语音信号的似然性,从而使高斯混合模型适合噪声,从而即使在语音占主导的帧期间也可以估计噪声。在Aurora-2和Aurora-4数据库上进行的一组全面的实验表明,所提出的方法相对于基线系统和其他类似的缺失数据插补技术,取得了显着改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号